Table of contents

1. Introduction

2. Broad analysis
2.1 Boxplots
2.2 Densityplot
2.3 PCA

3. Specific analysis: Lapatinib
3.1 Analysis of biomarker and t-test between treated and untreated cells
3.1.1 Expression of biomakers before and after the treatment of Lapatinib
3.1.2 Searching for our own biomarkers
3.1.3 Boxplot of our biomarkers
3.1.4 ggBoxplot of our biomarkers’ basalexpression in breast cells with Kruskal-Wallis test
3.2. Paired t-test between the Lapatinib treated und untreated data
3.2.1 Paired t-test of our biomarker before and after treatment
3.2.2 Visualization of the genome-wide paired t-test in a vulcano plot
3.3 K-means
3.4 Spearman correlation

4. Main questions
4.1 Question 1: Does the cell doubling time correlate with reduced drug sensitivity?
4.1.1 Linear regression ~ Doubling-time
4.1.2 Table of conclusion
4.2 Question 2: Does Lapatinib have a similar effect on lung cancer as Erlotinib?
4.2.1 Heatmap
4.2.2 Plot Erlotinib and Lapatinib, colored by tissue
4.2.3 Pearson correlation
4.2.4 Lung cellines
4.3 Question 3: Does Lapatinib have a similar effect on brain metastases as it does on breast cancer?
4.3.1 Volcano plot
4.3.2 Heatmap for visualization
4.3.3 Investigation of correlation

1. Introduction

Our project is about Lapatinib and divided into three parts: Broad analysis, specific analysis and our three main questions. The main questions are the most important part of our project. They address whether cell doubling time correlates with reduced drug sensitivity and the effect of lapatinib on lung cancer and brain metastases. Twenty-five per cent of breast cancers overexpress ErbB2 / HER which leads to a more aggressive phenotype. Lapatinib blocks tyrosine kinases that belong to the HER2 receptor, which is increasingly found on the cell surface of cancer cells. HER2 is a receptor for the epidermal growth factor (EGF), which stimulates the cell division of cancer cells. By blocking these HER2 receptors, the unregulated growth of cancer cells can be controlled again. Lapatinib binds the ATP-binding site of the receptor’s intracellular domain and inhibits the survival and proliferation pathways of the tumor cell.

2. Broad analysis

2.1 Boxplots

As the very first step in our data analysis, we checked the normalization of the data.

In order to check whether there is a connection between the various drugs, we colored the plot by drug.

As shown, there is a connection between the different “boxes” and the different drugs. This could be the reason for different batches so we scale the data.

As you can see the scaling is necessary to compensate the batch effect. Scaling is then also applied on Untreated before it is further analyzed.

2.2 Densityplot

As an other method to get in touch with the data, we checked if there is a difference between the treated and untreated genexpression. The abline shows the 3 quantiles ( 25 % 50 % 75 % )

As assumed, the expression patterns don’t differentiate much.

2.3 PCA

Since the clustering method by performing k-means clustering was not successful, we performed a principal component analysis for dimensionality reduction as another method for identifying clusters or patterns while preserving as much information of the original data set as possible.

pca <- prcomp(t(Fold_Change), scale = TRUE)

By plotting a scree plot of the variation percentage, we can display how much variation a principal component captures from the original data.

As seen in the scree plot, the first three or four PCs have capture most of the information since the curve bends at an “elbow” at PC4 - this is our cutting off point.

To determine clusters of samples based on their similarity, we plotted PC1 and PC2 as well as a PC2 and PC3 while coloring by tissue type and drug respectively.

#plot PC1 and PC2 colored by tissue

plot(pca$x[,1], 
     pca$x[,2], 
     col = color_tissue,
     pch = 19, 
     xlab = paste("PC1 (",pca.var.per[1],"%)"), 
     ylab = paste("PC2 (",pca.var.per[2],"%)"))

As we can see, in contrast to the k-means method, clusters can be determined. Respecting the color annotation, it is clear to say that the samples cluster according to drug treatment.

3. Specific Analysis: Lapatinib

3.1 Analysis of biomarker and t-test between treated and untreated cells

3.1.1 Expression of biomakers before and after the treatment of Lapatinib

We found these genes as biomarkers in a paper and presented them in our first presentation. (Source of the paper: Basal-Like Breast Cancer Defined by Five Biomarkers Has Superior Prognostic Value than Triple-Negative Phenotype, by Maggie C.U. Cheang, David Voduc, Chris Bajdik, et al. Clin Cancer Res 2008; 14:1368-1376)

Unfortunately, we can not use them as biomarkers because expression either decreases only slightly or even increases. So we searched for our own biomarkers.

3.1.2 Searching for our own biomarkers

The log2FoldChange was examined from the mean of all genes before and after treatment with Lapatinib.
##     GDF15     NUPR1     DDIT3     TRIB3      ATF3      ASNS 
## 0.7150148 0.6843715 0.5794749 0.5753359 0.5603209 0.5305063

GDF3 Growth Differentiation Factor 3 belongs to the transforming growth factor beta (TGF-beta) superfamily and has some intrinsic activity.

NUPR1 stands for nuclear protein, transcriptional regulator, 1. It regulates the transcription during stress signals by empowering cells with resistance to stress.

DDIT3 encodes for the DNA damage-inducible transcript 3 protein, a member of the CCAAT/enhancer-binding protein which is a transcription factor. It is important for the response of cell stresses. The DNA damage-inducible transcript 3 protein induces cell cycle arrest and apoptosis in response to ER stress.

TRIB3 stands for Tribbles Pseudokinase 3. The kinase is induced by the transcription factor NF-kappaB. It is essential for cell proliferation.

ATF3 stands for activating transcription factor 3. It belongs to the mammalian activation transcription factor/cAMP responsive element-binding (CREB). ATF-3 is induced by physiological stress.

ASNS stands for Asparagine Synthetase: The enzyme catalyzes the production of the amino acid L-asparagine.

3.1.3 Boxplot of our biomarkers

After the treatment with Lapatinib, the expression of the biomarkers has dropped.

3.1.4 ggBoxplot of our biomarkers’ basalexpression in breast cells with Kruskal-Wallis test

The null hypothesis is: there is no difference between the groups. The Kruskal-Wallis test is for independent samples. It tests whether the central trends of several independent samples differ. Since the p-value of the Kruskal-Wallis test is significant, our biomarkers do not show the same central tendencies. It can be conclude that the genes are not closely related. Because Lapatinib is a breast cancer drug, we are more closely studying the expression of biomarkers in breast tissue. It can be seen that ASNS and GDF15 are very strongly expressed in breast cancer cells.
ggboxplot(df_breast, x="Biomarkers", y="Expression", color = "Biomarkers",
          add = "jitter", legend = "none")+
  rotate_x_text(angle = 45)+
  stat_compare_means(method = "kruskal.test", label.y = 15)+   # Add global kurskal wallis test p-value
  stat_compare_means(label = "p.signif", method = "t.test",
                     ref.group = ".all.", hide.ns = TRUE)      # Pairwise comparison against all

3.2. Paired t-test between the Lapatinib treated and untreated data

3.2.1 Paired t-test of our biomarker before and after treatment

The paired t-test shows a statistical significance of our biomarkers.
##    Marker       pvalue
## 1:  GDF15 2.612447e-12
## 2:  NUPR1 1.327187e-11
## 3:  DDIT3 9.979311e-20
## 4:  TRIB3 6.995573e-20
## 5:   ATF3 2.142510e-12
## 6:   ASNS 2.207251e-14

3.2.2 Visualization of the genome-wide paired t-test in a volcano plot

If the p-value of a gene is less than 0.05, it turns red. Thus, all significant genes are stained red.
with(volcano, plot(log2FoldChange, -log10(pvalue), pch=20, main="Volcano plot", xlim=c(-2.5,2)))

3.3 K-means

The silhouette-plot can be used to determin the amount of clusters. The main value, the average silhouette width is an indicator of the chosen amount of clusters (higher values indicate better clusters).

The negative bars indicate an incorrectly assigned datapoint. Overall, the fewer clusters you have, the better your silhouette width values are. This is due to the datapoints being very proximate to one another and not being able to define specific or different clusters.

3.4 Spearman correlation

Modeling of a linear relationship between cell doubling time and inoculation density.

# Spearman correlation
cor(Cellline_Annotation$Doubling_Time, Cellline_Annotation$Inoculation_Density, method = "spearman")
## [1] 0.5674381

As you can see from the plot, there is a positive correlation between inoculation density and doubling time. This means that the higher the inoculation density, the higher the doubling time, which suggests that the tumour grows slowly because the cells need more time to divide.

4. Main questions

4.1 Question 1: Does the cell doubling time correlate with reduced drug sensitivity?

As mentioned in our presentation, we want to create a model to predict GI50-values and thus to predict, if Lapatinib is a good choice. Therefore, we took several datasets and performed three regression analyses.

4.1.1 Linear regression ~ Doubling-time

The second linear model tries to predict the GI-50 value among the data of the Doubling-time.

i2.train = sample(1:nrow(combined2), 48)

dat2.train = combined2[i2.train, ]
dat2.test = combined2[-i2.train, ]

l2.train = lm(NegLogGI50Lap ~ Doubling_Time, data = dat2.train)

4.1.2 Table of conclusion

The following table shows important values.

Inoculation_Density Doublingtime Multiple regression
R-squared-value 0.0020077 0.0155763 0.0469458
F-statistic (p-value) 0.7623440 0.3980029 0.3731734
RMSE of test-dataset 0.9140916 1.0247092 0.9413304

The data is hard to discuss, because we used random values for our test and training dataset. Hence, different values on every run are obtained. We could use “selected values” (First 20) but this would be kind of cheating. But overall, the models do not seem to be appropriate for good linear regression.

4.2 Question 2: Does Lapatinib have a similar effect on lung cancer as Erlotinib?

As we read in the paper “Antitumor and antiangiogenic effect of the dual EGFR and HER-2 tyrosine kinase inhibitor lapatinib in a lung cancer model. (Diaz et al., 2010)”, Lapatinib and Erlotinib are said to have similar effects. To verify this, we first looked at the correlation of the drugs and then created a graph showing the difference between GI50 for a certain cell line and the middle GI50 for both drugs.

4.2.1 Heatmap

Erlotinib and Lapatinib have a strong correlation.

4.2.2 Plot Erlotinib and Lapatinib, colored by tissue

The difference from the GI50 for a particular cell line and the mean GI50 is plotted here for Erlotinib and Lapatinib.

4.2.3 Pearson correlation

## [1] 0.6528188

A Pearson correlation coefficient of ~ 0.65 confirms that these patterns are similar. One reason for this is that both Erlotinib and Lapatinib are EGFR inhibitors. Cell lines that were more sensitive are displayed as bars that project to the right of the mean. Cell lines that were less sensitive are displayed with bars projected to the left.

4.2.4 Lung cellines

To answer our more specific question, whether Lapatinib also has an effect on lung cancer, we will only look at the cell lines in lung cancer.

The difference from the GI50 for cell line from lung tissue and the mean GI50 is plotted here for Erlotinib and Lapatinib.

## [1] 0.9609488

A pearson correlation coefficent of ~ 0.96 suggests that Lapatinib has a similar effect on lung cancer as Erlotinib.

4.3 Question 3: Does Lapatinib have a similar effect on brain metastases as it does on breast cancer?

As the paper “Lapatinib Distribution in HER2 Overexpressing Experimental Brain Metastases of Breast Cancer” (Taskar et.al., 2012) suggests, Lapatinb crosses the brain-blood barrier. Since breast cancer tends to spread and form metastases in brain cells and Lapatinib is used in anti-breast cancer therapy, we wanted to analyse our data for similar effects after Lapatinib treatment in breast and cns tissue cells.

At first, we selected Lapatinib response genes by creating matrices with cns- and breast-cell line fold change values.

Next, we performed a paired t-test of treated and untreated cns and breast samples, respectively, to determine statistical significant fold change values. Since we are performing multiple testing over different cell lines from breast and cns tissues, we want to adjust our p value by performing a Benjamini Hochberg correction.

4.3.1 Volcano plot

For visualization of the biological significance (fold change values) and the statistical significance (adjusted p-values), we plotted our data in an interactive volcano plot.

4.3.2 Heatmap for visualization

To visualize the expression patterns of the top peak genes present in both tissue types we calculated a heatmap, comparing the expression profiles of the two tissue types.

The heatmap shows, that the top expression genes are regulated similarly in breast and in cns cells after treatment with Lapatinib.

4.3.3 Investigation of correlation

At last, we investigated the expression regulation in both tissue types by calculating the pearson correlation between the fold change values of the top peak genes.

## [1] 0.921812

The obtained correlation value of ~ 0.92 indicates that Lapatinib treatment yields a similar effect on gene expression in breast and cns tissue cells and therefore might be possible to treat cancer in brain metastases.